Addressing Users' Privacy Concerns for Improving Personalization Quality: Towards an Integration of User Studies and Algorithm Evaluation
نویسندگان
چکیده
Numerous studies have demonstrated the effectiveness of personalization using quality criteria both from machine learning / data mining and from user studies. However, a site requires more than a high-performance personalization algorithm: it needs to convince its users to input the data needed by the algorithm. Today’s Web users are becoming increasingly privacyconscious and less willing to disclose personal data. How can the advantages of personalization (and hence, of disclosure) be communicated effectively, and how can the success of such strategies be measured in terms of improved personalization quality? In this paper, we argue for a tighter integration of the HCI and computational issues involved in these questions. We first outline the problems for personalization that arise from the combination of users’ privacy concerns and sites’ current policies of dealing with privacy issues. We then describe the results of an experiment that investigated the effects of changes to a site’s interface on users’ willingness to disclose data for personalization. This is followed by an overview of studies of the sensitivity of mining algorithms to changes in the availability of these types of data. Based on this, we outline a research agenda for future evaluation studies and user agent design. Various personalization systems have been developed in recent years and their benefits described [26, 27]. Personalized systems require data about individuals to successfully adapt to the user. However, users are getting more and more concerned about their privacy. A meta-study of 30 surveys has shown that Internet users strongly dislike the collection and use of personal data [45]. These privacy concerns represent a major impediment for a more wide-spread use of personalization [29] and useradaptive e-commerce [14]. Yet, current Web privacy statements are typically written in a way that seems as if site operators do not want users to read them: whereas 76% of respondents indicated that they find privacy policies very important [16], it has been found that users hardly pay any attention to them.2 1 This research was supported by the Deutsche Forschungsgemeinschaft, Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grant no. GRK 316/2). 2 For example, on the day after the company Excite@home was featured in a 60 Minutes segment on Internet privacy, only 100 out of 20 million unique visitors accessed that company’s privacy pages. Many site managers claim that fewer than 1% of all users read privacy policies [27]. 2 Bettina Berendt and Maximilian Teltzrow This situation has left site operators and researchers wondering how to better communicate their data collection policies and the advantages arising from them. In this paper, we address this question from an evaluative and instrumental perspective. In Section 1, concerns from a selection of consumer privacy surveys are highlighted. We then outline factors influencing users’ data disclosure behavior, which in return may impact the quality of personalization results. Section 2 describes the results of an experiment that suggests an influence of a site’s communication design on users’ willingness to share data. Section 3 then takes a more computational viewpoint and discusses quantitative methods of measuring the influence of data availability on personalization quality. Data availability will be operationalized in terms of levels of identity disclosure. The results also show that the availability of data interacts with the personalization algorithm chosen and with site characteristics. While personalization algorithm and site characteristics are the site operator’s decision parameters, the availability of data is the user’s decision parameter. A more user-oriented evaluation methodology will represent a shift in emphasis and require changes in methodology. Thus, in Section 4, we conclude by outlining requirements for the design of evaluation studies and site-user interfaces. In particular, we propose that an increased level of transparency of how a user’s data provision affects recommendation quality will prove beneficial for both users and sites. This work has implications for privacy research and practice, especially for managers of personalization sites. Moreover, it highlights links between HCI and computational aspects of personalization and suggests further work in the development of privacy-preserving personalization systems. 1 Problems for Personalization that Arise from Users’ Privacy Concerns and Sites’ Current Policies of Dealing with Privacy Issues Privacy concerns are a severe drawback to personalization. In this section, we describe data categories relevant for personalization and a selection of findings from consumer studies to give an insight into current user concerns. 1.1 A Categorization of Data Used for Personalization Personalization requires two types of knowledge: individual-user information, i.e., knowledge about the user to whom a recommendation is to be made, and background knowledge about what to recommend based on the individual-user information. The first type of knowledge consists of the (potentially personal) data that the individual user discloses; the second type consists of (i) information about the product catalog and business rules (e.g.: if a user is interested in action movies, recommend Terminator to him) and (ii) patterns derived from historical data (e.g., ratings given by previous users; site navigation patterns). This distinction is reflected in Kobsa, Koenemann, and Pohl’s [27] classification into user data, usage data, environment data (all concerning the individual user), and usage regularities. The P3P classification can be regarded as a further refinement of this idea. The Platform for Privacy Preferences (P3P) [12] provides Web site managers with a standardized way to disclose how their site collects, uses, and shares personal information about users. Addressing Users’ Privacy Concerns for Improving Personalization Quality 3 It provides several pre-defined types of data. It specifies a “data schema” describing sets of “data elements”, which are specific items of data a service might collect online. For example, it differentiates data categories such as “physical contact information”, “unique identifiers”, “purchase information”, “computer information”, “navigation and click-stream data”, or “demographic and socioeconomic data”. Since background knowledge relies on individual-user information, in the following we will concentrate on the subclasses of individual-user information. Table 1 shows them, adding the types of data disclosed when typical shopping dialogue questions are asked (used in, e.g., [42, 45]). A closer look at user perceptions and concerns reveals that a further criterion needs to be taken into account for classifying data. Investigating the concerns of the “pragmatic majority” [2] of users who exhibit a medium degree of privacy concerns, Spiekermann, Grossklags, and Berendt [42] found that one group were particularly concerned about disclosing aspects of their identity, while others were particularly opposed to revealing a personal profile. This distinction groups data across the previous classifications, as shown in Table 1. The resulting complexity indicates that an analysis of privacy concerns and their effects on personalization should start by focusing on specific subclasses of these concerns. Data Identity Profile
منابع مشابه
Buy It for Myself ’
Ecommerce web sites are increasingly introducing personalized features in order to build and retain relationships with customers and increase the number of purchases made by each customer. While survey data, (Personalization Consortium, 2000; Personalization Consortium, 2001), user studies (Karat, et al, 2003), and experience (Manber, 2000) indicate that many individuals appreciate personalizat...
متن کاملAddressing the Dilema Between Collaboration and Privacy in Coworking Spaces
This paper aims to inform design strategies for regulating privacy in coworking spaces. Coworking spaces are growing at a high rate, yet studies related to the social, psychological, behavioral and physical needs associated with these environments are limited. The growth of coworking spaces is in greater part facilitated by a drive towards greater interaction and collaboration among the workfor...
متن کاملPrivacy, Personalization, and the Web: A Utility-theoretic Approach
Online offerings such as web search face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highighted the potential to improve performance by introducing methods to personalize services based on special knowledge about users. For example, a user’s location, demographics, and past search and browsing may be useful in enhancing the efficiency...
متن کاملNetwork Resource Management for Improving Users Quality of experience in Software Defined Network by Weighted Fuzzy Petri-NetMethod
The rapid rise in popularity of multimedia applications, such as VoIP, IPTV and Video Conferencing, intensifies the need to consider resource management for user satisfaction. Furthermore, improving Quality of Experience (QoE) in Software Defined Networks (SDNs) services is one of the important issues to be addressed by provisioning optimum resource management. In this paper, resource allocatio...
متن کاملNetwork Resource Management for Improving Users Quality of experience in Software Defined Network by Weighted Fuzzy Petri-NetMethod
The rapid rise in popularity of multimedia applications, such as VoIP, IPTV and Video Conferencing, intensifies the need to consider resource management for user satisfaction. Furthermore, improving Quality of Experience (QoE) in Software Defined Networks (SDNs) services is one of the important issues to be addressed by provisioning optimum resource management. In this paper, resource allocatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003